-
Notifications
You must be signed in to change notification settings - Fork 9
Build zipline-ai with Commands "zipline compile" and "zipline run" #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces a comprehensive refactoring of command-line argument handling across multiple Python files, transitioning from Changes
Possibly Related PRs
Suggested Reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
Dockerfile (3)
79-79: Use COPY instead of ADD to follow best practices.-ADD api/py /srv/chronon +COPY api/py /srv/chronon🧰 Tools
🪛 Hadolint (2.12.0)
[error] 79-79: Use COPY instead of ADD for files and folders
(DL3020)
80-81: Installing requirements early can benefit layer caching.
82-83: COPY for individual scripts is fine; consider consolidating copies.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (6)
Dockerfile(1 hunks)api/py/ai/chronon/repo/explore.py(2 hunks)api/py/ai/chronon/repo/run.py(2 hunks)api/py/setup.py(2 hunks)docker-compose.yml(1 hunks)quickstart/requirements.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- quickstart/requirements.txt
🧰 Additional context used
🪛 Hadolint (2.12.0)
Dockerfile
[error] 79-79: Use COPY instead of ADD for files and folders
(DL3020)
🔇 Additional comments (9)
api/py/setup.py (3)
29-29: Version update is valid and aligns with semantic versioning.
55-61: Console scripts are an improvement over scripts= for modular CLI.
65-65: Renaming package to 'zipline-ai' is consistent with PR objectives.
api/py/ai/chronon/repo/run.py (2)
626-626: New main() function improves clarity and testing feasibility.
725-726: Entry point call is correct for script execution.
docker-compose.yml (1)
37-38: Switch to 'build' instead of pre-built image is good for local iteration.
Dockerfile (1)
85-85: Environment variables set the runtime path cleanly.
api/py/ai/chronon/repo/explore.py (2)
Line range hint 386-416: Encapsulating logic in a main() is nice.
This structuring makes the script more testable and clearer.
417-418: Good use of main() for script execution.
This avoids cluttering the global namespace. Verify that CLI args are parsed as intended.
✅ Verification successful
Let me check the main function implementation and CLI argument parsing.
CLI args are properly parsed and handled.
The main function correctly parses required and optional arguments using argparse:
- Required:
keywordfor lookup - Optional:
--conf-rootwith default value - Optional:
--handler-argsfor special handlers
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
# Find the main function implementation
ast-grep --pattern 'def main():
$$$'
Length of output: 13201
Script:
#!/bin/bash
# Look for argument parsing
rg "ArgumentParser|argparse|sys.argv" -A 3
Length of output: 2673
Script:
#!/bin/bash
# Get the full content of explore.py to understand the context
cat api/py/ai/chronon/repo/explore.py
Length of output: 14140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
api/py/ai/chronon/repo/explore.py (2)
Line range hint
387-417: LGTM! Consider adding error handling.Clean refactoring to improve modularity. Consider wrapping the main execution in a try-except block to handle potential exceptions gracefully.
def main(): + try: parser = argparse.ArgumentParser(description="Explore tool for chronon") parser.add_argument("keyword", help="Keyword to look up keys") parser.add_argument("--conf-root", help="Conf root for the configs", default=CWD) parser.add_argument( "--handler-args", nargs="*", help="Special arguments for handler keywords of the form param=value") args = parser.parse_args() root = args.conf_root if not (root.endswith("chronon") or root.endswith("zipline")): print("This script needs to be run from chronon conf root - with folder named 'chronon' or 'zipline', found: " + root) teams = load_team_data(os.path.join(root, 'teams.json')) gb_index = build_index("group_bys", GB_INDEX_SPEC, root=root, teams=teams) join_index = build_index("joins", JOIN_INDEX_SPEC, root=root, teams=teams) enrich_with_joins(gb_index, join_index, root=root, teams=teams) candidate = args.keyword if candidate in handlers: print(f"{candidate} is a registered handler") handler = handlers[candidate] handler_args = {} for arg in args.handler_args: splits = arg.split("=", 1) assert len(splits) == 2, f"need args to handler for the form, param=value. Found and invalid arg:{arg}" key, value = splits handler_args[key] = value handler(**handler_args) else: group_bys = find_in_index(gb_index, args.keyword) display_entries(group_bys, args.keyword, root=root, trim_paths=True) + except Exception as e: + print(f"Error: {e}") + return 1 + return 0
419-420: LGTM! Consider using the return code.Clean entry point. Consider using the return code from main().
if __name__ == "__main__": - main() + exit(main())api/py/ai/chronon/repo/run.py (1)
295-298: Adopt ternary for brevity.Use the suggested ternary operator for simpler logic:
-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (5)
api/py/ai/chronon/repo/compile.py(1 hunks)api/py/ai/chronon/repo/explore.py(2 hunks)api/py/ai/chronon/repo/run.py(10 hunks)api/py/ai/chronon/repo/zipline.py(1 hunks)api/py/setup.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
🔇 Additional comments (9)
api/py/ai/chronon/repo/zipline.py (3)
1-2: Looks good.Imports are straightforward and aligned with the new CLI usage.
7-9: Neat CLI group definition.A clear group name "zipline" neatly organizes subcommands.
12-13: Commands bound successfully.Subcommands integrate well via
add_command.api/py/ai/chronon/repo/compile.py (1)
51-51: Command name changed to 'compile'.This improves the CLI usage consistency and readability.
api/py/setup.py (3)
55-59: Entry point is well-defined.Centralized CLI entry point with
ziplineis a good move for discoverability.
60-60: Updated package description.Switched to "Zipline python API library" aligns with rebranding.
63-63: Renamed package to "zipline-ai".This completes the naming alignment with the new brand.
api/py/ai/chronon/repo/run.py (2)
627-669: CLI restructured with Click.Main function and options are well-organized and easy to extend.
679-685: Jar download logic.The fallback to
download_jaris clear, ensuring correctness for embedded vs. uber jars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
api/py/ai/chronon/repo/run.py (4)
295-298: Simplify context assignment using ternary operator-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
383-391: Extract online jar fetching logicMove the online jar fetching logic into a separate method for better readability and maintainability.
def __init__(self, args, jar_path): self.online_jar = args["online_jar"] - valid_jar = args["online_jar"] and os.path.exists(args["online_jar"]) - if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: - print("Downloading online_jar") - self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode( - "utf-8" - ) - os.environ["CHRONON_ONLINE_JAR"] = self.online_jar - print("Downloaded jar to {}".format(self.online_jar)) + self._fetch_online_jar_if_needed(args) +def _fetch_online_jar_if_needed(self, args): + valid_jar = args["online_jar"] and os.path.exists(args["online_jar"]) + if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: + print("Downloading online_jar") + self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode("utf-8") + os.environ["CHRONON_ONLINE_JAR"] = self.online_jar + print("Downloaded jar to {}".format(self.online_jar))
601-624: Consider centralizing default valuesDefault values are currently set in both
set_defaultsand click options. Consider moving all defaults to a central configuration.Also applies to: 626-665
283-290: Enhance error message for invalid conf pathAdd more context to the error message by including the expected path format.
- logging.error( - "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format( - params["conf"] - ) - ) + logging.error( + "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name> relative to zipline/ folder".format( + params["conf"] + ) + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
api/py/ai/chronon/repo/run.py (6)
196-200: Add version format validationAdd input validation for the version parameter to ensure it matches the expected format (e.g., X.Y.Z).
def download_jar( version, jar_type="uber", release_tag=None, spark_version="2.4.0", skip_download=False, +): + if version and version != "latest": + assert re.match(r'^\d+\.\d+\.\d+$', version), f"Invalid version format: {version}. Expected format: X.Y.Z" ):
295-298: Simplify context assignmentUse a ternary operator for better readability.
- if params["env"]: - context = params["env"] - else: - context = "dev" + context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
384-390: Improve online jar validation error handlingAdd explicit error message when online jar validation fails.
valid_jar = args["online_jar"] and os.path.exists(args["online_jar"]) # fetch online jar if necessary - if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: + if self.mode in ONLINE_MODES and not args["sub_help"]: + if not valid_jar and not args["online_jar_fetch"]: + raise ValueError(f"Online jar not found at {args['online_jar']} and no fetch command provided") print("Downloading online_jar")
579-580: Add date format validationAdd explicit date format validation before processing.
+ def validate_date_format(date_str): + try: + return datetime.strptime(date_str, "%Y-%m-%d") + except ValueError: + raise ValueError(f"Invalid date format: {date_str}. Expected format: YYYY-MM-DD") + + start_date = validate_date_format(start_date) + end_date = validate_date_format(end_date) - start_date = datetime.strptime(start_date, "%Y-%m-%d") - end_date = datetime.strptime(end_date, "%Y-%m-%d")
627-668: Improve help messages clarityThe help messages for some options could be more descriptive.
-@click.option("--ds", help="the end partition to backfill the data", default=datetime.today().strftime("%Y-%m-%d")) +@click.option("--ds", help="End partition date for data backfill (format: YYYY-MM-DD)", + default=datetime.today().strftime("%Y-%m-%d"))
679-688: Move jar path expansion earlierMove the path expansion before the jar download to avoid potential issues.
jar_path = ( - chronon_jar + os.path.expanduser(chronon_jar) if chronon_jar else download_jar( version, jar_type=jar_type, release_tag=release_tag, spark_version=os.environ.get("SPARK_VERSION", spark_version), ) ) - Runner(ctx.params, os.path.expanduser(jar_path)).run() + Runner(ctx.params, jar_path).run()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(10 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
api/py/ai/chronon/repo/run.py (2)
Line range hint
244-273: Improve error handling in environment setup.The environment setup lacks proper error handling for missing or invalid environment variables.
Add try-catch blocks and validation:
def set_runtime_env(params): + if not isinstance(params, dict): + raise ValueError("params must be a dictionary") effective_mode = params["mode"] + if effective_mode is None: + raise ValueError("mode parameter is required")🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
Line range hint
488-497: Add retry mechanism for streaming app verification.The streaming app verification lacks retry logic, which could fail due to temporary network issues.
+@retry_decorator(retries=3, backoff=20) +def verify_running_apps(app_name, list_apps_cmd): + running_apps = check_output("{}".format(list_apps_cmd)).decode("utf-8").split("\n") + return [json.loads(app.strip()) for app in running_apps if app.strip()] if len(filtered_apps) > 0: if self.mode == "streaming": assert len(filtered_apps) == 1, "More than one found, please kill them all"
🧹 Nitpick comments (3)
api/py/ai/chronon/repo/run.py (3)
295-298: Simplify context assignment using ternary operator.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
627-688: Enhance CLI documentation and type validation.The CLI options lack type validation and comprehensive documentation.
@click.option("--parallelism", + type=click.IntRange(min=1), help="break down the backfill range into this number of tasks in parallel. " "Please use it along with --start-ds and --end-ds and only in manual mode") @click.option("--repo", + type=click.Path(exists=True, file_okay=False, dir_okay=True), help="Path to chronon repo", default=os.environ.get("CHRONON_REPO_PATH", "."))
Line range hint
579-597: Optimize date range splitting for large ranges.The date range splitting function could be optimized for memory efficiency with large date ranges.
def split_date_range(start_date, end_date, parallelism): + if not isinstance(parallelism, int) or parallelism < 1: + raise ValueError("Parallelism must be a positive integer") start_date = datetime.strptime(start_date, "%Y-%m-%d") end_date = datetime.strptime(end_date, "%Y-%m-%d") - date_ranges = [] + for i in range(parallelism): + split_start = start_date + timedelta(days=i * split_size) + split_end = end_date if i == parallelism - 1 else split_start + timedelta(days=split_size - 1) + yield (split_start.strftime("%Y-%m-%d"), split_end.strftime("%Y-%m-%d"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
| self.repo = args["repo"] | ||
| self.conf = args["conf"] | ||
| self.sub_help = args["sub_help"] | ||
| self.mode = args["mode"] | ||
| self.online_jar = args["online_jar"] | ||
| valid_jar = args["online_jar"] and os.path.exists(args["online_jar"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate input parameters in Runner initialization.
The Runner class accepts parameters without validation.
Add parameter validation:
def __init__(self, args, jar_path):
+ if not isinstance(args, dict):
+ raise ValueError("args must be a dictionary")
+ if not jar_path or not isinstance(jar_path, str):
+ raise ValueError("jar_path must be a non-empty string")
self.repo = args["repo"]Committable suggestion skipped: line range outside the PR's diff.
…wys/setup_python_build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)
295-298: Simplify context assignment using ternary operator.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
673-675: Add docstring to main function.Add documentation describing the function's purpose and parameters.
def main(ctx, conf, env, mode, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class, version, spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type, online_args, chronon_jar, release_tag, list_apps, render_info): + """Execute the Chronon pipeline with the specified configuration. + + Args: + ctx: Click context object + conf: Configuration path + env: Environment (dev/prod) + [...] + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/py/ai/chronon/repo/compile.py(1 hunks)api/py/ai/chronon/repo/run.py(11 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- api/py/ai/chronon/repo/compile.py
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
🔇 Additional comments (2)
api/py/ai/chronon/repo/run.py (2)
583-584: LGTM: Correct handling of inclusive date ranges.
383-388: 🛠️ Refactor suggestionAdd parameter validation in Runner initialization.
Add checks for required parameters to prevent runtime errors.
def __init__(self, args, jar_path): + if not isinstance(args, dict): + raise ValueError("args must be a dictionary") + if not jar_path: + raise ValueError("jar_path is required") self.repo = args["repo"] self.conf = args["conf"]Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
api/py/ai/chronon/repo/run.py (2)
295-296: Simplify context assignment using ternary operator.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
605-632: Consider environment variable prefix constants.Define a constant for the environment variable prefix "CHRONON_" to avoid string duplication.
+ENV_PREFIX = "CHRONON_" def set_defaults(ctx): """Set default values based on environment""" - chronon_repo_path = os.environ.get("CHRONON_REPO_PATH", ".") + chronon_repo_path = os.environ.get(f"{ENV_PREFIX}REPO_PATH", ".")api/py/test/test_run.py (4)
31-47: Consider using a pytest fixture.
This helper could be converted into a fixture for more flexible reuse.
83-128: Reduce environment reset duplication.
Repeated code for resetting environment variables can be placed in a shared setup or fixture.Also applies to: 144-149, 159-164, 179-183, 188-192
169-175: Use a more specific exception.
CatchingExceptionis too broad. Consider a narrower exception class for clarity.🧰 Tools
🪛 Ruff (0.8.2)
169-169:
pytest.raises(Exception)should be considered evil(B017)
Line range hint
271-328: Consider unifying repeated steps.
Streaming and streaming-client tests share logic. A helper could reduce duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/py/ai/chronon/repo/run.py(11 hunks)api/py/test/test_run.py(8 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
api/py/test/test_run.py
169-169: pytest.raises(Exception) should be considered evil
(B017)
🔇 Additional comments (9)
api/py/ai/chronon/repo/run.py (5)
196-200: LGTM: Download jar function signature and validation.The function correctly validates spark versions against supported versions.
Also applies to: 203-203
383-388: Add input validation in Runner initialization.The past review comment about validating input parameters in Runner initialization is still valid.
Also applies to: 390-392
410-412: LGTM: Mode validation logic.The code properly validates modes against possible modes for the configuration type.
Also applies to: 415-416
583-584: LGTM: Date range calculation.The date range calculation correctly includes the end date.
634-701: LGTM: Click command setup.The command-line interface is well-structured with appropriate options and defaults.
api/py/test/test_run.py (4)
19-19: Good import for Click usage.
197-210: Looks good.
No issues spotted with property defaults logic.
213-241: Clean parameter management.
Render info testing appears consistent with the new approach.
Line range hint
244-264: No concerns.
Tests properly validate rendering script usage.
| effective_mode = params["mode"] | ||
| if effective_mode and "streaming" in effective_mode: | ||
| effective_mode = "streaming" | ||
| if args.repo: | ||
| teams_file = os.path.join(args.repo, "teams.json") | ||
| if params["repo"]: | ||
| teams_file = os.path.join(params["repo"], "teams.json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for missing repo path.
The code assumes repo path exists without validation.
if params["repo"]:
+ if not os.path.exists(params["repo"]):
+ raise ValueError(f"Repository path does not exist: {params['repo']}")
teams_file = os.path.join(params["repo"], "teams.json")Also applies to: 281-287
| self.args = args["args"] if args["args"] else "" | ||
| self.online_class = args["online_class"] | ||
| self.app_name = args["app_name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate paths before assignment.
File paths should be validated before assignment.
-self.spark_submit = args["spark_submit_path"]
+if not os.path.exists(args["spark_submit_path"]):
+ raise ValueError(f"Spark submit path does not exist: {args['spark_submit_path']}")
+self.spark_submit = args["spark_submit_path"]Also applies to: 430-430, 433-435, 437-438
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
api/py/ai/chronon/repo/run.py (3)
196-200: Add input validation for spark_version.The assertion message is good, but add upfront validation.
def download_jar( version, jar_type="uber", release_tag=None, spark_version="2.4.0", skip_download=False, ): + if not isinstance(spark_version, str): + raise ValueError(f"spark_version must be a string, got {type(spark_version)}") assert ( spark_version in SUPPORTED_SPARK ), f"Received unsupported spark version {spark_version}. Supported spark versions are {SUPPORTED_SPARK}"Also applies to: 203-203
295-296: Use ternary operator for context assignment.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
295-298: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
583-584: Use datetime.timedelta directly for date difference.Simplify the date range calculation.
- total_days = ( - end_date - start_date - ).days + 1 + total_days = (end_date - start_date).days + 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
295-298: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
🔇 Additional comments (5)
api/py/ai/chronon/repo/run.py (5)
270-274: Add path validation for repo and conf.The code assumes paths exist without proper validation.
Also applies to: 281-287
383-388: Add input validation for Runner initialization.Also applies to: 390-392
415-416: Validate paths before assignment.Also applies to: 418-418, 421-422, 426-428, 430-430, 433-435, 437-438
605-632: LGTM! Good default configuration setup.The function properly handles environment variables and default paths.
634-696: Well-structured CLI using Click.Good job on:
- Comprehensive options with clear help messages
- Environment variable fallbacks
- Flexible argument handling with
allow_extra_args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
api/py/ai/chronon/repo/run.py (1)
Line range hint
307-311: Add path traversal protection.Validate conf_path to prevent directory traversal attacks.
conf_path = os.path.join(params["repo"], params["conf"]) +conf_path = os.path.abspath(conf_path) +if not conf_path.startswith(os.path.abspath(params["repo"])): + raise ValueError("Invalid conf path: Attempted path traversal") if os.path.isfile(conf_path):🧰 Tools
🪛 Ruff (0.8.2)
300-303: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)
300-303: Simplify context assignment.Use a ternary operator for better readability.
-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
300-303: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
793-827: Add docstring and type hints to main function.The main function would benefit from:
- A comprehensive docstring explaining the command's purpose
- Type hints for all parameters
@click.command(name="run", context_settings=dict(allow_extra_args=True, ignore_unknown_options=True)) +def main( + ctx: click.Context, + conf: Optional[str], + env: str, + # ... other parameters with type hints +) -> None: + """Execute Chronon jobs with specified configuration and parameters. + + Args: + ctx: Click context for handling command state + conf: Configuration path relative to repo + env: Environment (dev/prod) + # ... document other parameters + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: no_spark_scala_tests
🔇 Additional comments (2)
api/py/ai/chronon/repo/run.py (2)
388-393: Add parameter validation in Runner initialization.Critical parameters should be validated to prevent runtime errors.
64-65: Verify Spark 3.5.1 compatibility.The addition of Spark 3.5.1 support needs validation.
✅ Verification successful
Spark 3.5.1 support properly configured
- Build files correctly define and use Spark 3.5.1
- Version validation and Scala mappings are consistent
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if Spark 3.5.1 is properly supported across the codebase rg -g '*.{scala,java}' 'spark.*3\.5\.1'Length of output: 41
Script:
#!/bin/bash # Check build files and Python code for Spark version references echo "=== Build Files ===" fd -e xml -e sbt . | xargs rg "3\.5\.1|spark" echo -e "\n=== Python Files ===" rg -g "*.py" "spark.*version|SUPPORTED_SPARK" echo -e "\n=== Test Files ===" fd -e py "test_" | xargs rg "spark.*version|3\.5\.1"Length of output: 5601
Script:
#!/bin/bash # Check for version-specific implementations rg -g "*.{scala,py}" "(?:if|match).*(?:spark|version).*3\." -A 3Length of output: 66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
api/py/ai/chronon/repo/run.py (1)
388-393:⚠️ Potential issueValidate input parameters in Runner initialization.
Add parameter validation for args and jar_path.
🧹 Nitpick comments (1)
api/py/ai/chronon/repo/run.py (1)
300-303: Use ternary operator for conciseness.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
300-303: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: no_spark_scala_tests
🔇 Additional comments (3)
api/py/ai/chronon/repo/run.py (3)
64-65: LGTM: Spark 3.5.1 support added correctly.
794-846: LGTM: Well-structured click command setup.Command options are comprehensive and well-documented.
275-279:⚠️ Potential issueAdd path validation.
Validate repository path before accessing files.
if params["repo"]: + if not os.path.exists(params["repo"]): + raise ValueError(f"Repository path does not exist: {params['repo']}") teams_file = os.path.join(params["repo"], "teams.json")Likely invalid or redundant comment.
| if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar: | ||
| print("Downloading online_jar") | ||
| self.online_jar = check_output("{}".format(args.online_jar_fetch)).decode( | ||
| self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode( | ||
| "utf-8" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for jar download.
Add try-except block to handle download failures gracefully.
if (self.mode in ONLINE_MODES) and (not args["sub_help"]) and not valid_jar:
print("Downloading online_jar")
+ try:
self.online_jar = check_output("{}".format(args["online_jar_fetch"])).decode(
"utf-8"
)
+ except subprocess.CalledProcessError as e:
+ raise RuntimeError(f"Failed to download online jar: {e}")
os.environ["CHRONON_ONLINE_JAR"] = self.online_jarCommittable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
api/py/ai/chronon/repo/run.py (2)
288-294: Improve error message specificity.The error message could be more helpful by suggesting the correct path format.
- logging.error( - "Invalid conf path: {}, please ensure to supply the relative path to zipline/ folder".format( - params["conf"] - ) - ) + logging.error( + "Invalid conf path: {}. Expected format: <context>/<conf_type>/<team>/<name>".format( + params["conf"] + ) + )
300-303: Simplify context assignment.Use a ternary operator for cleaner code.
- if params["env"]: - context = params["env"] - else: - context = "dev" + context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
300-303: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
api/py/ai/chronon/repo/run.py(12 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/ai/chronon/repo/run.py (5)
21-21: LGTM! Good choice of CLI framework.The transition to Click is a good choice as it provides better CLI structuring capabilities.
388-393: Add input validation.Previous review comment about parameter validation remains applicable.
794-828: LGTM! Well-structured CLI implementation.The click command is well-organized with descriptive help texts and appropriate option types.
431-443:⚠️ Potential issueValidate paths before assignment.
Add existence checks for spark_submit paths.
+ def _validate_path(path, name): + if not os.path.exists(path): + raise ValueError(f"Invalid {name}: {path}") + if self.mode == "streaming": + _validate_path(args["spark_streaming_submit_path"], "spark streaming submit path") self.spark_submit = args["spark_streaming_submit_path"] elif self.mode == "info": assert os.path.exists( args["render_info"] ), "Invalid path for the render info script: {}".format(args["render_info"]) self.render_info = args["render_info"] else: + _validate_path(args["spark_submit_path"], "spark submit path") self.spark_submit = args["spark_submit_path"]Likely invalid or redundant comment.
64-65: Verify Spark 3.5.1 compatibility.While adding Spark 3.5.1 support is good, ensure all features work with this version.
Run this to check for potential compatibility issues:
✅ Verification successful
Spark 3.5.1 compatibility verified successfully.
The codebase properly handles Spark 3.5.x specific changes and uses stable APIs. Version-specific features like error messages are correctly implemented in tests.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for Spark version-specific code or configurations rg -A 3 "spark.*3\.[0-9]"Length of output: 5105
Script:
#!/bin/bash # Check for SQL queries and potential deprecated features rg -g '*.{scala,java,py}' '(?i)spark\.sql|createDataFrame|withColumn|udf|rdd\.' -A 2 # Look for test configurations fd -e scala -e java test$ -x cat {}Length of output: 76729
Script:
#!/bin/bash # Check for test configurations and Spark version settings rg -g '*.{scala,java,xml,conf}' 'SparkSession\.builder|spark\.version|spark-version'Length of output: 361
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
api/py/requirements/base.txt (1)
1-7: Consider pinning pip-compile-multi versionAdd pip-compile-multi version to dev requirements for reproducible builds.
api/py/ai/chronon/repo/run.py (2)
300-303: Simplify context assignment using ternary operator.-if params["env"]: - context = params["env"] -else: - context = "dev" +context = params["env"] if params["env"] else "dev"🧰 Tools
🪛 Ruff (0.8.2)
300-303: Use ternary operator
context = params["env"] if params["env"] else "dev"instead ofif-else-blockReplace
if-else-block withcontext = params["env"] if params["env"] else "dev"(SIM108)
826-828: Add type hints to main function parameters.Improve code maintainability by adding type hints.
-def main(ctx, conf, env, mode, dataproc, ds, app_name, start_ds, end_ds, parallelism, repo, online_jar, online_class, - version, spark_version, spark_submit_path, spark_streaming_submit_path, online_jar_fetch, sub_help, conf_type, - online_args, chronon_jar, release_tag, list_apps, render_info): +def main(ctx: click.Context, conf: str, env: str, mode: str, dataproc: bool, ds: str, app_name: str, start_ds: str, + end_ds: str, parallelism: int, repo: str, online_jar: str, online_class: str, version: str, + spark_version: str, spark_submit_path: str, spark_streaming_submit_path: str, online_jar_fetch: str, + sub_help: bool, conf_type: str, online_args: str, chronon_jar: str, release_tag: str, + list_apps: str, render_info: str) -> None:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
api/py/ai/chronon/repo/run.py(12 hunks)api/py/requirements/base.txt(1 hunks)
🧰 Additional context used
🪛 LanguageTool
api/py/requirements/base.txt
[duplication] ~24-~24: Possible typo: you repeated a word.
Context: ...rage google-cloud-core==2.4.1 # via google-cloud-storage google-cloud-storage==2.19.0 # via -r requirements/base....
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~30-~30: Possible typo: you repeated a word.
Context: ...ia # google-cloud-storage # google-resumable-media google-resumable-media==2.7.2 # via google-cloud-storage g...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~57-~57: Possible typo: you repeated a word.
Context: ...# via google-auth six==1.17.0 # via thrift thrift==0.21.0 # via -r requirements/base....
(ENGLISH_WORD_REPEAT_RULE)
🪛 Ruff (0.8.2)
api/py/ai/chronon/repo/run.py
300-303: Use ternary operator context = params["env"] if params["env"] else "dev" instead of if-else-block
Replace if-else-block with context = params["env"] if params["env"] else "dev"
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: no_spark_scala_tests
🔇 Additional comments (5)
api/py/requirements/base.txt (2)
50-54: Check requests package security advisoriesVersion 2.32.3 is recent but verify security status.
✅ Verification successful
Requests package 2.32.3 is secure
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for CVEs gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: PIP, package: "requests", severities: [CRITICAL, HIGH]) { nodes { advisory { summary severity } vulnerableVersionRange } } }'Length of output: 426
25-26: Verify google-cloud-storage version compatibilityVersion 2.19.0 is relatively recent but check compatibility with your Python version.
✅ Verification successful
Python version compatibility verified
The google-cloud-storage 2.19.0 requirement (Python >=3.7) matches the project's Python version specification.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check minimum Python version requirement curl -s https://pypi.org/pypi/google-cloud-storage/2.19.0/json | jq '.info.requires_python'Length of output: 99
Script:
#!/bin/bash # Find Python version specs fd -e toml -e py -e cfg -e txt -e yaml -e yml -x grep -l "python" {} \; | xargs -I {} sh -c 'echo "=== {}:"; cat {}'Length of output: 68129
api/py/ai/chronon/repo/run.py (3)
64-65: LGTM: Spark 3.5.1 support added correctly.The mapping to Scala 2.12 for Spark 3.5.1 is accurate.
388-393: Validate input parameters in Runner initialization.The Runner class still accepts parameters without validation.
275-278:⚠️ Potential issueAdd path validation for repo directory.
Add existence check for the repo directory before accessing files within it.
effective_mode = params["mode"] if effective_mode and "streaming" in effective_mode: effective_mode = "streaming" +if params["repo"] and not os.path.exists(params["repo"]): + raise ValueError(f"Repository path does not exist: {params['repo']}") if params["repo"]:Likely invalid or redundant comment.
) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai. ## Checklist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `click` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated package metadata and versioning. - Expanded supported Spark versions. - **Package Changes** - Renamed package from "chronon-ai" to "zipline-ai". - Updated package description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `click` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>
) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai. ## Checklist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `click` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated package metadata and versioning. - Expanded supported Spark versions. - **Package Changes** - Renamed package from "chronon-ai" to "zipline-ai". - Updated package description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `click` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>
) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai. ## Checklist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `click` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated package metadata and versioning. - Expanded supported Spark versions. - **Package Changes** - Renamed package from "chronon-ai" to "zipline-ai". - Updated package description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `click` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>
) ## Summary Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to cliour clients so both can be commands for a cliour clients group. This change also updates the pypi paour clientsage name to zipline-ai. ## Cheour clientslist - [ ] Added Unit Tests - [ x] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new command-line interface (CLI) using the `cliour clients` library. - Added `zipline` command group with `extract_and_convert` and `run_main` commands. - **Improvements** - Enhanced command-line argument parsing and handling. - Updated paour clientsage metadata and versioning. - Expanded supported Spark versions. - **Paour clientsage Changes** - Renamed paour clientsage from "chronon-ai" to "zipline-ai". - Updated paour clientsage description and version. - **Breaking Changes** - Modified command-line argument structure. - Replaced `argparse` with `cliour clients` library for argument management. - **Dependency Updates** - Added multiple new dependencies, including `google-cloud-storage`. - Removed several dependencies related to Google Cloud services from development requirements. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --- - To see the specific tasks where the Asana app for GitHub is being used, see below: - https://app.asana.com/0/0/1209001371150237 --------- Co-authored-by: tchow <[email protected]>
Summary
Creates console command "zipline" with subcommands "compile" and "run". This change refactors run.py from argparse to click so both can be commands for a click group. This change also updates the pypi package name to zipline-ai.
Checklist
Summary by CodeRabbit
Release Notes
New Features
clicklibrary.ziplinecommand group withextract_and_convertandrun_maincommands.Improvements
Package Changes
Breaking Changes
argparsewithclicklibrary for argument management.Dependency Updates
google-cloud-storage.